Very large vocabulary speech recognition system for automatic transcription of czech broadcast programs
نویسندگان
چکیده
This paper describes the first speech recognition system capable of transcribing a wide range of spoken broadcast programs in Czech language with the OOV rate being below 3 per cent. To achieve that level we had to a) create an optimized 200k word vocabulary with multiple text and pronunciation forms, b) extract an appropriate language model from a 300M word text corpus and c) develop an own decoder specially designed for the lexicon of that size. The system was tested on various types of broadcast programs with the following results: the Czech part of the European COST278 database of TV news (71.5 % accuracy rate on complete news streams, 82.7 % on their clean parts), radio news (80.2 %), read commentaries (78.6 %), broadcast debates (74.3 %) and recordings of the state presidents’ speeches (85.8 %).
منابع مشابه
Fully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon
We present a system developed for fully automated processing of Czech spoken broadcast programs. It includes modules for unsupervised segmentation of audio stream, speaker and gender recognition followed by speaker adaptation, and own speech decoder designed for extremely large vocabularies. Compared to our previous results reported in 2004, the new system reduced the WER (evaluated on the Czec...
متن کاملAn improved preprocessor for the automatic transcription of broadcast news audio stream
This paper deals with the preprocessing of the broadcast news (BN) audio stream for the automatic transcription purposes. The preprocessing consists of the automatic segmentation followed by the broad-class segment identification. The former is capable of detecting speaker and/or acoustic changes in the BN audio stream with the precision being 82.75%. The latter acts as a filter that removes no...
متن کاملMAP Based Speaker Adaptation in Very Large Vocabulary Speech Recognition of Czech
The paper deals with the problem of efficient adaptation of speech recognition systems to individual users. The goal is to achieve better performance in specific applications where one known speaker is expected. In our approach we adopt the MAP (Maximum A Posteriori) method for this purpose. The MAP based formulae for the adaptation of the HMM (Hidden Markov Model) parameters are described. Sev...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملContinual On-line Monitoring of Czech
In the paper we describe the development of the first practical system that performs automatic on-line monitoring of Czech broadcast stations. It is based on our own speech recognition server that operates with 300K word lexicon and 2.3 RT factor. For true on-line service, several servers are connected to the platform that controls acoustic stream segmentation, distribution of data to the serve...
متن کامل